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In a parallel discrete-event simulation (PDES) scheme, tasks are distributed 
among processing elements (PEs), whose progress is controlled by a synchro- 
nization scheme. For lattice systems with short-range interactions, the progress 
of the conservative PDES scheme is governed by the Kardar-Parisi-Zhang 
equation from the theory of non-equilibrium surface growth. Although the 
simulated (virtual) times of the PEs progress at a nonzero rate, their standard 
deviation (spread) diverges with the number of PEs, hindering efficient data 
collection. We show that weak random interactions among the PEs can make 
this spread nondivergent. The PEs then progress at a nonzero, near-uniform 
rate without requiring global synchronizations. 
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Simulating large systems often leaves the programmer with only one option: parallel dis- 
tributed simulations where parts of the system are allocated and simulated on different pro- 
cessing elements (PE). A large class of interacting systems, including financial market mod- 
els, epidemic models, dynamics of magnetic systems, and queuing networks, can be described 
by a set of local state variables assuming a finite number of possible values. As the system 
evolves in time, the values of the local state variables change at discrete instants, synchronously 
or asynchronously depending on the dynamics of the system. Parallel simulation for the for- 
mer is straightforward (at least conceptually). For the latter, i.e., for asynchronous or non- 
parallel dynamics, one must use some kind of synchronization to ensure causality. The in- 
stantaneous changes in the local configuration are also called discrete events, hence the term 
parallel discrete-event simulations (PDES) (dEHH). Examples of PDES applications include 
dynamic channel allocation in cell-phone communication networks models the spread of 
diseases ©, battle-field simulations (6), and dynamic phenomena in highly anisotropic mag- 
netic systems QIHIl. Here the discrete events are call arrivals, infections, troop movements, and 
changes of the orientation of the local magnetic moments, respectively. As the number of PEs 
on parallel architectures increases to tens of thousands, fundamental questions of the scalability 
of the underlying algorithms must be addressed. Here we show a way to construct fully scal- 
able parallel simulations for systems with asynchronous dynamics and short-range interactions. 
Understanding the effects of the microscopic dynamics (corresponding to the algorithmic syn- 
chronization rules) on the global properties of the simulation scheme brings us to the solution. 
Recently, a similar connection has been made ^ between rollback-based PDES schemes (ITOb 
and self-organized criticality (1771) . 

The two basic ingredients of PDES are the set of local simulated times, often referred to 
as virtual times fTU\, and a synchronization scheme (1). In order for the PDES scheme to be 
scalable (I7?t . (/) the virtual time horizon should progress on average at a nonzero rate; (ii) the 



2 



typical spread of the time horizon should be bounded as the number of PEs NpE goes to infinity. 
The first criterion ensures a nonzero progress rate in the limit of large A^pe- It is, however, not 
sufficient if data are to be collected. Different PEs have progressed to different local simulated 
times with a possibly large spread among them, making measurement a complex task. Fre- 
quent global synchronizations can get costly for large A^pe, whereas temporarily storing a large 
amount of data as a result of the large virtual time spread is limited by the available memory. 
Therefore, (ii) is crucial for the measurement part of the algorithm to be scalable. Here we in- 
troduce a PDES scheme in which the PEs make nonzero and close-to-uniform progress without 
global intervention. 

Li conservative PDES schemes il3\\14\\r5[ . which we focus on, an update is performed by 
a particular PE only if the resulting change in the local configuration of the simulated system 
is guaranteed not to violate causality. Otherwise the PE idles. The efficiency of the scheme 
depends on the fraction of non-idling PEs. It was shown il6\\17h that the virtual time horizon 
exhibits kinetic roughening il8\\19li for the basic conservative scheme applied to systems with 
short-range interactions on regular lattices. In particular, the evolution of the virtual time hori- 
zon is governed by the Kardar-Parisi-Zhang (KPZ) equation (l20t . which plays a central role 
in non-equilibrium surface growth il8\\19h . The above finding has two major implications for 
the asymptotic scalability of the basic conservative PDES scheme il6\\21\i : criterion (/) for the 
scalability is satisfied because the average progress rate of the virtual time horizon approaches 
a nonzero value in the limit A'pe oo. Criterion (ii), however, is violated because the virtual 
time horizon becomes macroscopically rough. 

For illustration, we consider a general one-dimensional system with nearest-neighbor in- 
teractions, in which the discrete events exhibit Poisson asynchrony. In the one-site-per PE sce- 
nario, each site has its own local simulated time, constituting the virtual time horizon {rj (t) . 
Here t is the discrete number of parallel steps executed by all PEs, which is proportional to the 
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wall-clock time. According to the basic conservative synchronization scheme il4l\15l . at each 
parallel step t, only those PEs for which the local simulated time is not greater then the local 
simulated times of their neighbors, can increment their local time by an exponentially dis- 
tributed random amount. (Without loss of generality we assume that the mean of the local time 
increment is one in simulated time units [stu].) Thus, if Ti(t) < min{T:j_i(t), 'rj+i(t)}, PE i 
can update the configuration of the underlying site it carries and determine the time of the next 
event. Otherwise, it idles. Despite its simplicity, this rule preserves unaltered the asynchronous 
causal dynamics of the underlying system jMWlSh . Further, the progress rate of the simulation 
(u(t))7VpE (the density of local minima of the virtual time horizon) approaches a nonzero con- 
stant in the asymptotic long-time, large-iVpE limit <il6l21h . The average width of the virtual time 
horizon, however, diverges as A^pe oo (E^II^l. Specifically, the average width is defined as 
(^'(i))iVpK = ( V^Vpe) (E.=T [r.(t) - nt)f) where r(t) = (l/Np^) E^T r,it). For any finite 
number of PEs, the width grows for early times as ~ t^^, and after a system-size 

dependent cross-over time ~ A^pe it reaches its steady state with {w'^{oo))j^^^ ~ ^pe- Here 
a is the roughness exponent (equal to 1/2 in our example), which quantifies how the average 
width of the surface diverges in the large- ApE limit. The reason for this divergence is that the 
local interaction topology of the PEs mimicks that of the underlying system. Under the basic 
conservative synchronization rules, the PEs form a strongly interacting system in which the 
correlation length reaches the system size A^pe {16). This correlation length is responsible for 
the diverging surface fluctuations captured by the average width. 

To reduce the width, one must de-correlate the fluctuations of the virtual time surface. We 
achieve this by introducing quenched random communication links between the PEs in addition 
to the regular nearest-neighbor interactions of the underlying physical system. At the beginning 
of the simulation we connect each PE to another one, chosen randomly from the rest. Note that 
these random links in addition to the regular lattice connections result in a small- world network 
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(EU, where beyond the nearest neighbors, each PE is connected with a randomly chosen one 
(l25b . In the modified conservative PDES scheme, at every parallel step each PE with probability 
p compares its local simulated time with its full virtual neighborhood and can only advance if 
it is a local minimum, i.e., if rj(t) < min{rj_i(t), Ti^i{t), Tj.{i){t)}, where r{i) is the random 
connection of PE i. With probability (1 — p) each PE follows the original scheme. Note that the 
occasional extra checking of the simulated time of the random neighbor is not needed for the 
faithfulness of the simulation. It is merely introduced to control the width of the time horizon. 
Using a coarse-graining procedure analogous to that used in (173b we find that the large-scale 
properties of the virtual time horizon of our modified scheme are governed by the equation. 



where f(x, t) is the coarse-grained surface-height fluctuation measured from the mean, and the 
temporal and spatial derivatives are the coarse-grained interpretations of finite differences. Sim- 
ilarly, r]{x,t) is a coarse-grained noise, delta-correlated in space and time. The coefficients 7 
and A carry the details of the coarse-graining procedure. In particular, 7(p) is nonzero for all 
nonzero values of p and 7(p) — > as p — > 0, i.e., in this limit we recover the KPZ equation 
for the original scheme. The morphological properties of the surface (the virtual time horizon) 
governed by Eq. [T] are fundamentally different from the one-dimensional KPZ surface for all 
PT^O. The algorithmic rules extended to include the quenched random connections introduce a 
relaxational term (the first term in Eq.[l]) in the evolution of the virtual time horizon. This term 
converts the system into a mean-field growth model, whose time horizon is macroscopically 
smooth, i.e., has a finite width. To support the coarse-graining arguments, we simulated the 
exact stochastic growth process defined by our modified conservative algorithmic rules. Snap- 
shots of the progress of the simulation, i.e., the virtual time horizon, are shown in Fig. 1 for the 
original (p = 0.00) and modified (p = 0.10) cases. 




(1) 
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The structure of the virtual time horizon can be most transparently understood from the 
steady-state structure factor S{k) = (fkf^k) /NpE, where is the spatial Fourier transform 
of the surface fluctuations (l24l) . The presence of the relaxational (first) term in the stochastic 
growth equation, Eq.[ll implies that limfc_»o S{k) < oo, i.e., there are no large-amplitude long- 
wavelength modes in the surface. Consequently, the width, {'w'^)npe = (V^pe) Z^fc^o S{k) is 
also finite. For further understanding we consider only the linear terms in Eq.[l]and obtain 

S{k) cc . (2) 

In this approximation the lateral correlation length ^ of the surface fluctuations is 1/^7^ i-^., 
it is finite for all p ^ 0. From Eq. |2lit also follows that the local slopes of the virtual time 
horizon remain short-range correlated and the utilization {u{oo)) ATp^ (average progress rate) is 
a nonzero constant in the limit NpE oo, just like in the original conservative scheme. The 
behavior of the structure factor S{k) indicates (Fig. 2) that it approaches a finite value as /c — > 0. 
Further, the linearized version of the theory seems to work well in the small-fc regime (Fig. 2 
inset), yielding a finite correlation length (,^ ^ 26 for p = 0. 10) for a long chain of PEs. 

Choosing an infinitesimally small p, the virtual time horizon becomes macroscopically 
smooth with a finite width. At the same time, the utilization is only reduced by an infinites- 
imal amount, as a result of the occasional extra checking with the random neighbors. This 
trade-off is substantially favorable for the conservative PDES scheme: by giving up an in- 
finitesimally small fraction of the utilization, the width is reduced from infinity to a finite value, 
making measurement and data management scalable under the conservative PDES scheme. For 
example, forp = 0.10, (w^(oo))7VpE is reduced from infinity to approximately 5.25 [Fig. 3(A)], 
while {u{oo))npe is reduced by only about 1.6% [Fig. 3(B)]. One can also observe the clear 
self-averaging property for both global observables, the width and the utilization (Fig. 3). From 
a broader statistical physics viewpoint, one can ask whether a small, non-zero fraction q of the 
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PEs with random links (checked, e.g., at every step) is sufficient to control the width. While this 
choice clearly weakens load balancing and the utilization [Fig. 3(B)], our results in Fig. 3(A) 
and recent work on the closely related XY-model on a small-world network d^Tb suggest that 
a finite width is achieved. There is growing evidence that systems without inherent frustration 
exhibit mean-field characteristics when the original short-range interaction topology is modified 
to a small-world network (l2JII25ll26ll^ . 

The generalization when random links are added to a higher-dimensional underlying regular 
lattice is clear: since the one-dimensional case with random links is governed by the mean- 
field equation, in higher dimensions it will be even more so (i.e., the critical dimension (17^ 
of the model with random links is less than one). The generalization for the many-sites-per- 
PE case also follows from universality arguments: without the random connections, there is 
an additional fast-roughening phase for early times when the evolution of the time horizon 
corresponds to random deposition (18). Subsequently, it will cross over to the KPZ growth 
regime and finally saturate. With many sites per PE, the typically desired and efficient way of 
implementing PDES dTIITJt. the saturation value of the width can become extremely large. This 
underscores the importance of implementing the additional synchronizations through random 
links to suppress the roughness of the time horizon. 
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Figure 1: Snapshots of the virtual time horizons in the steady state for the one-site-per PE 
conservative PDES scheme with A^pe = 10, 000 after t = 10^ parallel algorithmic steps for (A) 
the original algorithm (p = 0.00) and (B) the modified one {p = 0.10) with quenched random 
connections. The vertical scale, 80 simulated time units (stu) is the same in (A) and (B). 
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Figure 2: Steady- state structure factor of the virtual time horizon for p = 0.10. In addition to 
ensemble averages over 100 realizations of the random links (filled symbols), single realizations 
(the same open symbols) are also shown. The inset shows a magnified view of 1/ S{k) versus 
k"^ for small k for the largest system. The straight line is the best linear fit used to determine the 
correlation length. 
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Figure 3: (A) Average steady-state width of the virtual time horizon for various values of p for 
the one-site-per-PE PDES scheme. In addition to ensemble averages over 10 realizations of the 
random links (filled symbols), a single realization is also shown (the same open symbols). The 
solid straight line represents the asymptotic one-dimensional KPZ power-law divergence with 
roughness exponent a = 1/2 for the p = case. Note the log-log scales. The g = 0.10 data set 
corresponds to the case when only 10% of the PEs have random links and those are checked at 
every step. (B) The steady-state utilization (fraction of non-idling PEs) for the same cases as in 
(A). 
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